Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Cell ; 187(7): 1801-1818.e20, 2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38471500

RESUMO

The repertoire of modifications to bile acids and related steroidal lipids by host and microbial metabolism remains incompletely characterized. To address this knowledge gap, we created a reusable resource of tandem mass spectrometry (MS/MS) spectra by filtering 1.2 billion publicly available MS/MS spectra for bile-acid-selective ion patterns. Thousands of modifications are distributed throughout animal and human bodies as well as microbial cultures. We employed this MS/MS library to identify polyamine bile amidates, prevalent in carnivores. They are present in humans, and their levels alter with a diet change from a Mediterranean to a typical American diet. This work highlights the existence of many more bile acid modifications than previously recognized and the value of leveraging public large-scale untargeted metabolomics data to discover metabolites. The availability of a modification-centric bile acid MS/MS library will inform future studies investigating bile acid roles in health and disease.


Assuntos
Ácidos e Sais Biliares , Microbioma Gastrointestinal , Metabolômica , Espectrometria de Massas em Tandem , Animais , Humanos , Ácidos e Sais Biliares/química , Metabolômica/métodos , Poliaminas , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Compostos Químicos
2.
Anal Chem ; 96(6): 2590-2598, 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38294426

RESUMO

High-resolution mass spectrometry (HRMS) is a prominent analytical tool that characterizes chlorinated disinfection byproducts (Cl-DBPs) in an unbiased manner. Due to the diversity of chemicals, complex background signals, and the inherent analytical fluctuations of HRMS, conventional isotopic pattern (37Cl/35Cl), mass defect, and direct molecular formula (MF) prediction are insufficient for accurate recognition of the diverse Cl-DBPs in real environmental samples. This work proposes a novel strategy to recognize Cl-containing chemicals based on machine learning. Our hierarchical machine learning framework has two random forest-based models: the first layer is a binary classifier to recognize Cl-containing chemicals, and the second layer is a multiclass classifier to annotate the number of Cl present. This model was trained using ∼1.4 million distinctive MFs from PubChem. Evaluated on over 14,000 unique MFs from NIST20, this machine learning model achieved 93.3% accuracy in recognizing Cl-containing MFs (Cl-MFs) and 92.9% accuracy in annotating the number of Cl for Cl-MFs. Furthermore, the trained model was integrated into ChloroDBPFinder, a standalone R package for the streamlined processing of LC-HRMS data and annotating both known and unknown Cl-containing compounds. Tested on existing Cl-DBP data sets related to aspartame chlorination in tap water, our ChloroDBPFinder efficiently extracted 159 Cl-containing DBP features and tentatively annotated the structures of 10 Cl-DBPs via molecular networking. In another application of a chlorinated humic substance, ChloroDBPFinder extracted 79 high-quality Cl-DBPs and tentatively annotated six compounds. In summary, our proposed machine learning strategy and the developed ChloroDBPFinder provide an advanced solution to identifying Cl-containing compounds in nontargeted analysis of water samples. It is freely available on GitHub (https://github.com/HuanLab/ChloroDBPFinder).

3.
ACS Infect Dis ; 10(1): 107-119, 2024 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-38054469

RESUMO

Cholesterol is a critical growth substrate for Mycobacterium tuberculosis (Mtb) during infection, and the cholesterol catabolic pathway has been targeted for the development of new antimycobacterial agents. A key metabolite in cholesterol catabolism is 3aα-H-4α(3'-propanoate)-7aß-methylhexahydro-1,5-indanedione (HIP). Many of the HIP metabolites are acyl-coenzyme A (CoA) thioesters, whose accumulation in deletion mutants can cause cholesterol-mediated toxicity. We used LC-MS/MS analysis to demonstrate that deletion of genes involved in HIP catabolism leads to acyl-CoA accumulation with concomitant depletion of free CoASH, leading to dysregulation of central metabolic pathways. CoASH and acyl-CoAs inhibited PanK, the enzyme that catalyzes the first step in the transformation of pantothenate to CoASH. Inhibition was competitive with respect to ATP with Kic values ranging from 9 µM for CoASH to 57 µM for small acyl-CoAs and 180 ± 30 µM for cholesterol-derived acyl-CoA. These findings link two critical metabolic pathways and suggest that therapeutics targeting cholesterol catabolic enzymes could both prevent the utilization of an important growth substrate and simultaneously sequester CoA from essential cellular processes, leading to bacterial toxicity.


Assuntos
Mycobacterium tuberculosis , Espectrometria de Massas em Tandem , Cromatografia Líquida , Colesterol/metabolismo , Coenzima A/metabolismo
4.
Nat Commun ; 14(1): 8488, 2023 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-38123557

RESUMO

Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum. Annotations were propagated to unknowns based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer's brain phenotype. The nearest neighbor suspect spectral library is openly available for download or for data analysis through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data.


Assuntos
Acesso à Informação , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Metabolômica/métodos , Biblioteca Gênica , Análise por Conglomerados
5.
Cell Rep ; 42(8): 112997, 2023 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-37611587

RESUMO

Colorectal cancer (CRC) is driven by genomic alterations in concert with dietary influences, with the gut microbiome implicated as an effector in disease development and progression. While meta-analyses have provided mechanistic insight into patients with CRC, study heterogeneity has limited causal associations. Using multi-omics studies on genetically controlled cohorts of mice, we identify diet as the major driver of microbial and metabolomic differences, with reductions in α diversity and widespread changes in cecal metabolites seen in high-fat diet (HFD)-fed mice. In addition, non-classic amino acid conjugation of the bile acid cholic acid (AA-CA) increased with HFD. We show that AA-CAs impact intestinal stem cell growth and demonstrate that Ileibacterium valens and Ruminococcus gnavus are able to synthesize these AA-CAs. This multi-omics dataset implicates diet-induced shifts in the microbiome and the metabolome in disease progression and has potential utility in future diagnostic and therapeutic developments.


Assuntos
Neoplasias Colorretais , Microbioma Gastrointestinal , Microbiota , Animais , Camundongos , Ácidos e Sais Biliares , Metaboloma
6.
Anal Chem ; 95(35): 13018-13028, 2023 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-37603462

RESUMO

The purity of tandem mass spectrometry (MS/MS) is essential to MS/MS-based metabolite annotation and unknown exploration. This work presents a de novo approach to cleaning chimeric MS/MS spectra generated in liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based metabolomics. The assumption is that true fragments and their precursors are well correlated across the samples in a study, while false or contamination fragments are rather independent. Using data simulation, this work starts with an investigation of the negative effects of chimeric MS/MS spectra on spectral similarity analysis and molecular networking. Next, the characteristics of true and false fragments in chimeric MS/MS spectra were investigated using MS/MS of chemical standards. We recognized three fragment peak attributes indicative of whether a peak is a false fragment, including (1) intensity ratio fluctuation, (2) appearance rate, and (3) relative intensity. Using these attributes, we tested three machine learning models and identified XGBoost as the best model to achieve an area under the precision-recall curve of 0.98 for a clear separation between true and false fragments. Based on the trained model, we constructed an automated bioinformatic platform, DNMS2Purifier (short for de novo MS2Purifier), for metabolic features from metabolomics studies. DNMS2Purifier recognizes and processes chimeric MS/MS spectra without additional sample analysis or library confirmation. DNMS2Purifer was evaluated on a metabolomics data set generated with different MS/MS precursor isolation windows. It successfully captured the increase in the number of false fragments from the increased isolation window. DNMS2Purifier was also compared to MS2Purifier, an existing MS/MS spectral cleaning tool based on the addition of data-independent acquisition (DIA) analysis. Results indicated that DNMS2Purifier uniquely recognizes false fragments, which complements the previous DIA-based approach. Finally, DNMS2Purifier was demonstrated using a real experimental metabolomics study, showing improved MS/MS spectral quality and leading to an improved spectral match ratio and molecular networking outcome.


Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Cromatografia Líquida , Análise Espectral , Biologia Computacional
7.
Nat Methods ; 20(6): 881-890, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37055660

RESUMO

A substantial fraction of metabolic features remains undetermined in mass spectrometry (MS)-based metabolomics, and molecular formula annotation is the starting point for unraveling their chemical identities. Here we present bottom-up tandem MS (MS/MS) interrogation, a method for de novo formula annotation. Our approach prioritizes MS/MS-explainable formula candidates, implements machine-learned ranking and offers false discovery rate estimation. Compared with the mathematically exhaustive formula enumeration, our approach shrinks the formula candidate space by 42.8% on average. Method benchmarking on annotation accuracy was systematically carried out on reference MS/MS libraries and real metabolomics datasets. Applied on 155,321 recurrent unidentified spectra, our approach confidently annotated >5,000 novel molecular formulae absent from chemical databases. Beyond the level of individual metabolic features, we combined bottom-up MS/MS interrogation with global optimization to refine formula annotations while revealing peak interrelationships. This approach allowed the systematic annotation of 37 fatty acid amide molecules in human fecal data. All bioinformatics pipelines are available in a standalone software, BUDDY ( https://github.com/HuanLab/BUDDY ).


Assuntos
Software , Espectrometria de Massas em Tandem , Humanos , Espectrometria de Massas em Tandem/métodos , Metabolômica/métodos , Biologia Computacional , Bases de Dados de Compostos Químicos
8.
Environ Health Perspect ; 131(3): 37009, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36913238

RESUMO

BACKGROUND: Due to many substances in the human exposome, there is a dearth of exposure and toxicity information available to assess potential health risks. Quantification of all trace organics in the biological fluids seems impossible and costly, regardless of the high individual exposure variability. We hypothesized that the blood concentration (CB) of organic pollutants could be predicted via their exposure and chemical properties. Developing a prediction model on the annotation of chemicals in human blood can provide new insight into the distribution and extent of exposures to a wide range of chemicals in humans. OBJECTIVES: Our objective was to develop a machine learning (ML) model to predict blood concentrations (CBs) of chemicals and prioritize chemicals of health concern. METHODS: We curated the CBs of compounds mostly measured at population levels and developed an ML model for chemical CB predictions by considering chemical daily exposure (DE) and exposure pathway indicators (δij), half-lives (t1/2), and volume of distribution (Vd). Three ML models, including random forest (RF), artificial neural network (ANN) and support vector regression (SVR) were compared. The toxicity potential or prioritization of each chemical was represented as a bioanalytical equivalency (BEQ) and its percentage (BEQ%) estimated based on the predicted CB and ToxCast bioactivity data. We also retrieved the top 25 most active chemicals in each assay to further observe changes in the BEQ% after the exclusion of the drugs and endogenous substances. RESULTS: We curated the CBs of 216 compounds primarily measured at population levels. RF outperformed the ANN and SVF models with the root mean square error (RMSE) of 1.66 and 2.07µM, the mean absolute error (MAE) values of 1.28 and 1.56µM, the mean absolute percentage error (MAPE) of 0.29 and 0.23, and R2 of 0.80 and 0.72 across test and testing sets. Subsequently, the human CBs of 7,858 ToxCast chemicals were successfully predicted, ranging from 1.29×10-6 to 1.79×10-2 µM. The predicted CBs were then combined with ToxCast in vitro bioassays to prioritize the ToxCast chemicals across 12 in vitro assays with important toxicological end points. It is interesting that we found the most active compounds to be food additives and pesticides rather than widely monitored environmental pollutants. DISCUSSION: We have shown that the accurate prediction of "internal exposure" from "external exposure" is possible, and this result can be quite useful in the risk prioritization. https://doi.org/10.1289/EHP11305.


Assuntos
Poluentes Ambientais , Expossoma , Praguicidas , Humanos , Algoritmo Florestas Aleatórias , Poluentes Ambientais/toxicidade , Praguicidas/análise
9.
Chem Commun (Camb) ; 58(72): 9979-9990, 2022 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-35997016

RESUMO

Advancements in computer science and software engineering have greatly facilitated mass spectrometry (MS)-based untargeted metabolomics. Nowadays, gigabytes of metabolomics data are routinely generated from MS platforms, containing condensed structural and quantitative information from thousands of metabolites. Manual data processing is almost impossible due to the large data size. Therefore, in the "omics" era, we are faced with new challenges, the big data challenges of how to accurately and efficiently process the raw data, extract the biological information, and visualize the results from the gigantic amount of collected data. Although important, proposing solutions to address these big data challenges requires broad interdisciplinary knowledge, which can be challenging for many metabolomics practitioners. Our laboratory in the Department of Chemistry at the University of British Columbia is committed to combining analytical chemistry, computer science, and statistics to develop bioinformatics tools that address these big data challenges. In this Feature Article, we elaborate on the major big data challenges in metabolomics, including data acquisition, feature extraction, quantitative measurements, statistical analysis, and metabolite annotation. We also introduce our recently developed bioinformatics solutions for these challenges. Notably, all of the bioinformatics tools and source codes are freely available on GitHub (https://www.github.com/HuanLab), along with revised and regularly updated content.


Assuntos
Big Data , Espectrometria de Massas em Tandem , Biologia Computacional , Metabolômica/métodos , Software , Espectrometria de Massas em Tandem/métodos
10.
Nat Commun ; 13(1): 2510, 2022 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-35523965

RESUMO

Interrelating small molecules according to their aligned fragmentation spectra is central to tandem mass spectrometry-based untargeted metabolomics. Current alignment algorithms do not provide statistical significance and compounds that have multiple delocalized structural differences and therefore often fail to have their fragment ions aligned. Here we align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE). SIMILE yields spectral alignment inferred structural connections in molecular networks that are not found with cosine-based scoring algorithms. In addition, it is now possible to rank spectral alignments based on p-values in the exploration of structural relationships between compounds and enhance the chemical connectivity that can be obtained with molecular networking.


Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Algoritmos , Íons
11.
Metabolites ; 12(3)2022 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-35323655

RESUMO

Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data has been a long-standing bioinformatic challenge in untargeted metabolomics. Conventional feature extraction algorithms fail to recognize features with low signal intensities, poor chromatographic peak shapes, or those that do not fit the parameter settings. This problem also poses a challenge for MS-based exposome studies, as low-abundant metabolic or exposomic features cannot be automatically recognized from raw data. To address this data processing challenge, we developed an R package, JPA (short for Joint Metabolomic Data Processing and Annotation), to comprehensively extract metabolic features from raw LC-MS data. JPA performs feature extraction by combining a conventional peak picking algorithm and strategies for (1) recognizing features with bad peak shapes but that have tandem mass spectra (MS2) and (2) picking up features from a user-defined targeted list. The performance of JPA in global metabolomics was demonstrated using serial diluted urine samples, in which JPA was able to rescue an average of 25% of metabolic features that were missed by the conventional peak picking algorithm due to dilution. More importantly, the chromatographic peak shapes, analytical accuracy, and precision of the rescued metabolic features were all evaluated. Furthermore, owing to its sensitive feature extraction, JPA was able to achieve a limit of detection (LOD) that was up to thousands of folds lower when automatically processing metabolomics data of a serial diluted metabolite standard mixture analyzed in HILIC(-) and RP(+) modes. Finally, the performance of JPA in exposome research was validated using a mixture of 250 drugs and 255 pesticides at environmentally relevant levels. JPA detected an average of 2.3-fold more exposure compounds than conventional peak picking only.

12.
Anal Chim Acta ; 1200: 339613, 2022 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-35256147

RESUMO

Collision-induced dissociation (CID) is a common fragmentation strategy in tandem mass spectrometry (MS2) analysis. A conventional understanding is that fragment ions generated in low-energy CID should follow the even-electron rule. As such, (de)protonated ([M+H]+/[M-H]-) or even-electron precursor ions should follow heterolytic cleavages and predominately generate even-electron fragment ions with very few radical fragment ions (RFIs). However, the extent to which RFIs present in MS2 spectra has not been comprehensively investigated. This work uses the annotated high-resolution MS2 spectra from the latest NIST 20 tandem mass spectral library to investigate the occurrence of RFIs in CID MS2 experiments. In particular, RFIs were recognized using integer double bond equivalent (DBE) values calculated from their annotated molecular formulas. Our study shows that 65.4% and 68.8% of MS2 spectra of even-electron precursors contain at least 10% RFIs by ion-count (total number of ions) in positive and negative electrospray ionization modes, respectively. Furthermore, we classified chemicals based on their compound classes and chemical substructures, and calculated the percentages of RFIs in each class. As expected, compounds that can stabilize the radical site via resonance, such as aromatic and conjugated double bond-containing chemicals, are more likely to form RFIs. We also found four possible patterns of change in RFI percentages as a function of CID collision energy. Finally, we demonstrate that the inadequate consideration of RFIs in most conventional bioinformatic tools might be problematic during in silico fragmentation and de novo annotation of MS2 spectra. This work provides a further understanding of CID MS2 mechanisms, and the unexpectedly large percentage of RFIs suggests that the even-electron rule seems to be challenged in numerous cases where it is disobeyed.


Assuntos
Espectrometria de Massas por Ionização por Electrospray , Espectrometria de Massas em Tandem , Elétrons , Íons , Espectrometria de Massas por Ionização por Electrospray/métodos , Espectrometria de Massas em Tandem/métodos
13.
Anal Chem ; 93(36): 12181-12186, 2021 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-34455775

RESUMO

Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data relies on the recognition of extracted ion chromatogram (EIC) peak shapes using peak picking algorithms. Unfortunately, all peak picking algorithms present a significant drawback of generating a problematic number of false positives. In this work, we take advantage of deep learning technology to develop a convolutional neural network (CNN)-based program that can automatically recognize metabolic features with poor EIC shapes, which are of low feature fidelity and more likely to be false. Our CNN model was trained using 25095 EIC plots collected from 22 LC-MS-based metabolomics projects of various sample types, LC and MS conditions. Notably, we manually inspected all the EIC plots to assign good or poor EIC quality for accurate model training. The trained CNN model is embedded into a C#-based program, named EVA (short for evaluation). The EVA Windows Application is a versatile platform that can process metabolic features generated by LC-MS systems of various vendors and processed using various data processing software. Our comprehensive evaluation of EVA indicates that it achieves over 90% classification accuracy. EVA can be readily used in LC-MS-based metabolomics projects and is freely available on the Microsoft Store by searching "EVA Metabolomics".


Assuntos
Aprendizado Profundo , Algoritmos , Cromatografia Líquida , Espectrometria de Massas , Metabolômica
14.
Anal Chem ; 93(29): 10243-10250, 2021 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-34270210

RESUMO

In-source fragmentation (ISF) is a naturally occurring phenomenon during electrospray ionization (ESI) in liquid chromatography-mass spectrometry (LC-MS) analysis. ISF leads to false metabolite annotation in untargeted metabolomics, prompting misinterpretation of the underlying biological mechanisms. Conventional metabolomic data cleaning mainly focuses on the annotation of adducts and isotopes, and the recognition of ISF features is mainly based on common neutral losses and the LC coelution pattern. In this work, we recognized three increasingly important patterns of ISF features, including (1) coeluting with their precursor ions, (2) being in the tandem MS (MS2) spectra of their precursor ions, and (3) sharing similar MS2 fragmentation patterns with their precursor ions. Based on these patterns, we developed an R package, ISFrag, to comprehensively recognize all possible ISF features from LC-MS data generated from full-scan, data-dependent acquisition, and data-independent acquisition modes without the assistance of common neutral loss information or MS2 spectral library. Tested using metabolite standards, we achieved a 100% correct recognition of level 1 ISF features and over 80% correct recognition for level 2 ISF features. Further application of ISFrag on untargeted metabolomics data allows us to identify ISF features that can potentially cause false metabolite annotation at an omics-scale. With the help of ISFrag, we performed a systematic investigation of how ISF features are influenced by different MS parameters, including capillary voltage, end plate offset, ion energy, and "collision energy". Our results show that while increasing energies can increase the number of real metabolic features and ISF features, the percentage of ISF features might not necessarily increase. Finally, using ISFrag, we created an ISF pathway to visualize the relationships between multiple ISF features that belong to the same precursor ion. ISFrag is freely available on GitHub (https://github.com/HuanLab/ISFrag).


Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Cromatografia Líquida , Biblioteca Gênica , Íons
15.
Front Chem ; 9: 674265, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34055742

RESUMO

Hair is a unique biological matrix that adsorbs short-term exposures (e. g., environmental contaminants and personal care products) on its surface and also embeds endogenous metabolites and long-term exposures in its matrix. In this work, we developed an untargeted metabolomics workflow to profile both temporal exposure chemicals and endogenous metabolites in the same hair sample. This analytical workflow begins with the extraction of short-term exposures from hair surfaces through washing. Further development of mechanical homogenization extracts endogenous metabolites and long-term exposures from the cleaned hair. Both solutions of hair wash and hair extract were analyzed using ultra-high-performance liquid chromatography-high-resolution mass spectrometry (UHPLC-HRMS)-based metabolomics for global-scale metabolic profiling. After analysis, raw data were processed using bioinformatic programs recently developed specifically for exposome research. Using optimized experimental conditions, we detected a total of 10,005 and 9,584 metabolic features from hair wash and extraction samples, respectively. Among them, 274 and 276 features can be definitively confirmed by MS2 spectral matching against spectral library, and an additional 3,356 and 3,079 features were tentatively confirmed as biotransformation metabolites. To demonstrate the performance of our hair metabolomics, we collected hair samples from three female volunteers and tested their hair metabolic changes before and after a 2-day exposure exercise. Our results show that 645 features from wash and 89 features from extract were significantly changed from the 2-day exposure. Altogether, this work provides a novel analytical approach to study the hair metabolome and exposome at a global scale, which can be implemented in a wide range of biological applications for a deeper understanding of the impact of environmental and genetic factors on human health.

16.
Environ Health Perspect ; 129(4): 47014, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33929905

RESUMO

BACKGROUND: Due to the ubiquitous use of chemicals in modern society, humans are increasingly exposed to thousands of chemicals that contribute to a major portion of the human exposome. Should a comprehensive and risk-based human exposome database be created, it would be conducive to the rapid progress of human exposomics research. In addition, once a xenobiotic is biotransformed with distinct half-lives upon exposure, monitoring the parent compounds alone may not reflect the actual human exposure. To address these questions, a comprehensive and risk-prioritized human exposome database is needed. OBJECTIVES: Our objective was to set up a comprehensive risk-prioritized human exposome database including physicochemical properties as well as risk prediction and develop a graphical user interface (GUI) that has the ability to conduct searches for content associated with chemicals in our database. METHODS: We built a comprehensive risk-prioritized human exposome database by text mining and database fusion. Subsequently, chemicals were prioritized by integrating exposure level obtained from the Systematic Empirical Evaluation of Models with toxicity data predicted by the Toxicity Estimation Software Tool and the Toxicological Priority Index calculated from the ToxCast database. The biotransformation half-lives (HLBs) of all the chemicals were assessed using the Iterative Fragment Selection approach and biotransformation products were predicted using the previously developed BioTransformer machine-learning method. RESULTS: We compiled a human exposome database of >20,000 chemicals, prioritized 13,441 chemicals based on probabilistic hazard quotient and 7,770 chemicals based on risk index, and provided a predicted biotransformation metabolite database of >95,000 metabolites. In addition, a user-interactive Java software (Oracle)-based search GUI was generated to enable open access to this new resource. DISCUSSION: Our database can be used to guide chemical management and enhance scientific understanding to rapidly and effectively prioritize chemicals for comprehensive biomonitoring in epidemiological investigations. https://doi.org/10.1289/EHP7722.


Assuntos
Expossoma , Gerenciamento de Dados , Mineração de Dados , Bases de Dados Factuais , Exposição Ambiental , Humanos
17.
Anal Chem ; 93(14): 5735-5743, 2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33784068

RESUMO

Despite the vast amount of metabolic information that can be captured in untargeted metabolomics, many biological applications are looking for a biology-driven metabolomics platform that targets a set of metabolites that are relevant to the given biological question. Steroids are a class of important molecules that play critical roles in many physiological systems and diseases. Besides known steroids, there are a large number of unknown steroids that have not been reported in the literature. The ability to rapidly detect and quantify both known and unknown steroid molecules in a biological sample can greatly accelerate a broad range of steroid-focused life science research. This work describes the development and application of SteroidXtract, a convolutional neural network (CNN)-based bioinformatics tool that can recognize steroid molecules in mass spectrometry (MS)-based untargeted metabolomics using their unique tandem MS (MS2) spectral patterns. SteroidXtract was trained using a comprehensive set of standard MS2 spectra from MassBank of North America (MoNA) and an in-house steroid library. Data augmentation strategies, including intensity thresholding and Gaussian noise addition, were created and applied to minimize data overfitting caused by the limited number of standard steroid MS2 spectra. The CNN model embedded in SteroidXtract was further compared with random forest and XGBoost using nested cross-validations to demonstrate its performance. Finally, SteroidXtract was applied in several metabolomics studies to demonstrate its sensitivity, specificity, and robustness. Compared to conventional statistics-driven metabolomics data interpretation, our work offers a novel automated biology-driven approach to interpreting untargeted metabolomics data, prioritizing biologically important molecules with high throughput and sensitivity.


Assuntos
Aprendizado Profundo , Biologia Computacional , Metabolômica , Esteroides , Espectrometria de Massas em Tandem
18.
J Am Soc Mass Spectrom ; 32(9): 2296-2305, 2021 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-33739814

RESUMO

Tandem mass spectral (MS/MS) data in liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis are often contaminated as the selection of precursor ions is based on a low-resolution quadrupole mass filter. In this work, we developed a strategy to differentiate contamination fragment ions (CFIs) from true fragment ions (TFIs) in an MS/MS spectrum. The rationale is that TFIs should coelute with their parent ions, but CFIs should not. To assess coelution, we performed a parallel LC-MS/MS analysis in data-independent acquisition (DIA) with all-ion-fragmentation (AIF) mode. Using the DIA (AIF) data, peak-peak correlation (PPC) score is calculated between the extracted ion chromatogram (EIC) of the fragment ion using the MS/MS scans and the EIC of the precursor ion using the MS1 scans. A high PPC score is an indication of TFIs, and a low PPC score is an indication of CFIs. Tested using metabolomics data generated by high resolution QTOF and Orbitrap MS from various vendors in different LC-MS configurations, we found that more than 70% of the fragment ions have PPC scores < 0.8 and identified three common sources of CFIs, including (1) solvent contamination, (2) adjacent chemical contamination, and (3) undetermined signals from artifacts and noise. Combining PPC scores with other precursor and fragment ion information, we further developed a machine learning model that can robustly and conservatively predict CFIs. Incorporating the machine learning model, we created an R program, MS2Purifier, to automatically recognize CFIs and clean MS/MS spectra of metabolic features in LC-MS/MS data with high sensitivity and specificity.

19.
Anal Chem ; 93(4): 2669-2677, 2021 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-33465307

RESUMO

Existing data acquisition modes such as full-scan, data-dependent (DDA), and data-independent acquisition (DIA) often present limited capabilities in capturing metabolic information in liquid chromatography-mass spectrometry (LC-MS)-based metabolomics. In this work, we proposed a novel metabolomic data acquisition workflow that combines DDA and DIA analyses to achieve better metabolomic data quality, including enhanced metabolome coverage, tandem mass spectrometry (MS2) coverage, and MS2 quality. This workflow, named data-dependent-assisted data-independent acquisition (DaDIA), performs untargeted metabolomic analysis of individual biological samples using DIA mode and the pooled quality control (QC) samples using DDA mode. This combination takes advantage of the high-feature number and MS2 spectral coverage of the DIA data and the high MS2 spectral quality of the DDA data. To analyze the heterogeneous DDA and DIA data, we further developed a computational program, DaDIA.R, to automatically extract metabolic features and perform streamlined metabolite annotation of DaDIA data set. Using human urine samples, we demonstrated that the DaDIA workflow delivers remarkably improved data quality when compared to conventional DDA or DIA metabolomics. In particular, both the number of detected features and annotated metabolites were greatly increased. Further biological demonstration using a leukemia metabolomics study also proved that the DaDIA workflow can efficiently detect and annotate around 4 times more significant metabolites than DDA workflow with broad MS2 coverage and high MS2 spectral quality for downstream statistical analysis and biological interpretation. Overall, this work represents a critical development of data acquisition mode in untargeted metabolomics, which can greatly benefit untargeted metabolomics for a wide range of biological applications.


Assuntos
Confiabilidade dos Dados , Metabolômica/métodos , Software , Humanos , Leucemia/metabolismo , Metaboloma , Urinálise , Fluxo de Trabalho
20.
Appl Environ Microbiol ; 87(5)2021 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-33355101

RESUMO

Endospore formation is used by members of the phylum Firmicutes to withstand extreme environmental conditions. Several recent studies have proposed endospore formation in species outside of Firmicutes, particularly in Rhodobacter johrii and Serratia marcescens, members of the phylum Proteobacteria. Here, we aimed to investigate endospore formation in these two species by using advanced imaging and analytical approaches. Examination of the phase-bright structures observed in R. johrii and S. marcescens using cryo-electron tomography failed to identify endospores or stages of endospore formation. We determined that the phase-bright objects in R. johrii cells were triacylglycerol storage granules and those in S. marcescens were aggregates of cellular debris. In addition, R. johrii and S. marcescens containing phase-bright objects do not possess phenotypic and genetic features of endospores, including enhanced resistance to heat, presence of dipicolinic acid, or the presence of many of the genes associated with endospore formation. Our results support the hypothesis that endospore formation is restricted to the phylum Firmicutes.Importance: Bacterial endospore formation is an important process that allows the formation of dormant life forms called spores. As such, organisms able to sporulate can survive harsh environmental conditions for hundreds of years. Here, we follow up on previous claims that two members of Proteobacteria, Serratia marcescens and Rhodobacter johrii, are able to form spores. We conclude that those claims were incorrect and show that the putative spores in R. johrii and S. marcescens are storage granules and cellular debris, respectively. This study concludes that endospore formation is still unique to the phylum Firmicutes.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...